Multi-label Text Classification of German Language Medical Documents

نویسندگان

  • Stephan Spat
  • Bruno Cadonna
  • Ivo Rakovac
  • Christian Gütl
  • Hubert Leitner
  • Günther Stark
  • Peter Beck
چکیده

and Objective Nearly at every patient visit medical documents are produced and stored in a medical record, often in unstructured form as free text. Growing amount of stored documents increases the need for effective and timely retrieval of information. We developed a multi-label classification system to categorize German language free text medical documents (e.g. discharge letters, clinical findings, reports) into predefined classes. A random sample of 1,500 free text medical documents was retrieved from a general hospital information system, and was assigned manually to 1 to 8 categories by a domain expert. This sample was used to train and evaluate the performance of 4 classification schemes: Naïve Bayes, kNN, SVM and J48. Additional tests of the effect of text preprocessing were done. In our study preprocessing improved the performance, and best results were obtained by J48 classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced Information Retrieval from Narrative German-language Clinical Text Documents using Automated Document Classification

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This...

متن کامل

Prototype of a Medical Information Retrieval System for Electronic Patient Records Finding relevant information in clinical text documents

The Steiermärkische Krankenanstalten Ges.m.b.H. (KAGes) conducted the roll-out of an electronic patient record (EPR) system in 2004. This system contains an increasing amount of unstructured clinical text documents in German language. In order to facilitate the patient-related medical decision-making for physicians, this diploma thesis analyses and implements methods retrieving relevant medical...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Multi-label Classification of Product Reviews Using Structured Svm

Most of the text classification problems are associated with multiple class labels and hence automatic text classification is one of the most challenging and prominent research area. Text classification is the problem of categorizing text documents into different classes. In the multi-label classification scenario, each document is associated may have more than one label. The real challenge in ...

متن کامل

A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007